Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures
نویسندگان
چکیده
منابع مشابه
Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures
Manycore architectures – hundreds to thousands of cores per processor – are seen by many as a natural evolution of multicore processors. To take advantage of this massive parallelism in practice requires a productive parallel programming model, and an efficient runtime for the scheduling and coordination of concurrent tasks. A critical prerequisite for an efficient runtime is a scalable synchro...
متن کاملEfficient On-Chip Pipelined Streaming Computations on Scalable Manycore Architectures
Performance of manycore processors is limited by programs’ use of off-chip main memory. Streaming computation organized in a pipeline limits accesses to main memory to tasks at boundaries of the pipeline to read or write to main memory. The Single Chip Cloud computer (SCC) offers 48 cores linked by a highspeed on-chip network, and allows the implementation of such on-chip pipelined technique. W...
متن کاملSelf-adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures
Based on the premise that preconditioners needed for scientific computing are not only required to be robust in the numerical sense, but also scalable for up to thousands of light-weight cores, we argue that this two-fold goal is achieved for the recently developed self-adaptive multi-elimination preconditioner. For this purpose, we revise the underlying idea and analyze the performance of impl...
متن کاملApproximate weighted matching on emerging manycore and multithreaded architectures
Graph matching is a prototypical combinatorial problem with many applications in high performance scientific computing. Optimal algorithms for computing matchings are challenging to parallelize. Approximation algorithms are amenable to parallelization and are therefore important to compute matchings for large scale problems. Approximation algorithms also generate nearly optimal solutions that a...
متن کاملTemperature-Aware Amdahl’s Law for Manycore Architectures
Small cores provide greater throughput per unit area and per watt when sufficient concurrency is available, motivating organizations with many simple cores. However, sufficient concurrency is often not available; even applications that can use many cores often have serial parts. Amdahl’s Law favors an asymmetric architecture and shows that one or more large, high-ILP cores are needed in these c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2018
ISSN: 1045-9219
DOI: 10.1109/tpds.2017.2755655